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Abstract 

Lately,  there  has  been  an  extensive  interest  in  the  possible  uses  of  neural  networks  for 
nonlinear  system  identification  and  control.  In  this  paper,  we  provide  a  framework  for  the 
simultaneous  identification  and  control  of  a  class  of  unknown,  uncertain  nonlinear  systems. 
The  identification  portion  relies  on  modeling  the  system  by  a  neural  network  which  is  trained 
via  a  local  variant  of  the  Extended  Kalman  Filter.  We  will  discuss  this  local  algorithm  for 
training  a  neural  network  to  approximate  a  nonlinear  feedback  system.  We  also  give  a 
dynamic  programming-based  method  of  deriving  near  optimal  control  inputs  for  the  real 
plant  based  on  this  approximation  and  a  measure  of  its  error  (covariance).  Finally,  we 
combine  these  methods  in  a  hierarchical  algorithm  for  identification  and  control  of  a  class 
of  uncertain,  unknown  systems.  The  complexity  of  the  whole  algorithm  is  analyzed. 
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1  Introduction 


Lately,  there  has  been  an  extensive  interest  in  the  possible  uses  of  neural  networks  for  non¬ 
linear  system  identification  and  control  [7,  3,  6].  Typically,  these  approaches  use  multilayer 
feedforward  neural  networks  with  various  backpropagation  learning  rules. 

Recurrent  neural  networks  have  thus  far  enjoyed  a  disproportionately  low  level  of  attention. 
That  is  mainly  because  the  delays  internal  to  such  a  model  make  traditional  backpropagation 
learning  ineffective.  Matthews  and  Moschytz  [5]  suggest  that  just  as  multilayer  perceptrons 
have  been  shown  to  be  good  approximators  for  nonlinear  feedforward  systems,  so  can  recurrent 
neural  networks  for  nonlinear  feedback  systems. 

In  particular,  Matthews  and  Moschytz  first  suggested  the  use  of  the  Extended  Kalman  Filter 
(EKF)  algorithm  for  training  neural  networks  [5].  Later  Livstone,  Farrell,  and  Baker  showed 
how  to  localize  the  EKF  algorithm  to  make  the  learning  more  computationally  efficient  [4].  The 
Spatially  Localized  Extended  Kalman  (SLEK)  algorithm  of  [4]  forms  the  basis  of  this  paper. 

We  will  first  discuss  the  SLEK  algorithm  for  training  a  neural  network  to  approximate  a 
nonlinear  feedback  system.  We  next  give  a  dynamic  programming-based  method  of  deriving 
near  optimal  control  inputs  for  the  real  plant  based  on  this  approximation  and  a  meastxre  of  its 
error  (covariance).  Combining  these  methods  leads  to  a  hierarchical  algorithm  for  identification 
and  control  of  a  class  of  uncertain,  unknown  systems.  Finally,  the  complexity  of  the  whole 
algorithm  is  analyzed. 

2  Identification  and  Control  of  Unknown,  Uncertain  Systems 

2.1  Approximating  the  System 

The  algorithms  in  this  paper  are  written  to  handle  the  case  where 

*[*  +  1]  =  /(*[*]>«[*])+  /kn0wn(z[fr],«[fc])  +  ff(£[fc]) 
y[fe]  =  h(x[k])  +  v[k] 

where  x,£  G  Rn,  u  G  Rm,  y,v  G  Rp  and  /,  /kn0wn,  g,  h  are  continuous,  possibly  nonlinear, 
functions  on  the  associated  spaces.  We  assume  /known)  9i  and  h  are  known;  /  is  unknown. 
Further,  £[fc]  and  u[fc]  are  uncorrelated  white  Gaussian  noise  sequences  that  are  also  uncorrelated 
with  the  initial  condition  *[0]. 

However,  for  ease  of  presentation,  we  will  treat  the  special  case  where  the  system  dynamics 
are  modeled  as: 


x[k  +  1]  =  /(*[&],  u[fc])  +  £[fc] 

y[k]  =  x[k]  +  v[fc] 

where  x,£,v,y  G  Rn,  u  G  Rm,  and  /  G  C[Rn+m,  i?n];  /  is  unknown.  Further,  £[fr]  and  t>[fc] 
are  uncorrelated  white  Gaussian  noise  sequences  with  covariances 

=  Q[k]hn 

E{v[k}vT[n}}  =  R[k]6kn 
£{£[fc]ur[n]}  =  0 
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and  £[fc],t>[&]  are  uncorrelated  with  the  initial  state  *[0].  Note  that  the  meastirement  equation 
is  linear.  The  nonlinear  state  equation  will  be  approximated  by  the  network 

X[k+  1]  S3  /net(*[fc],tt[fe];0[fc]) 

where  0[k ]  are  network  parameters  to  be  learned.  Let  <?[&]  =  [ir[fe],  uT[k]]T.  We  will  refer  to 
this  Rn+m  vector  space  as  the  state-input  space.  With  this  notation  we  see 

/n«t,i(«[fc];  *[*])=  ^2  Aijgj(q[k))  (1) 

i=i 

where  the  gj  are  radial  basis  functions  (RBFs)  and  Aij,  the  magnitudes  or  weights ,  are  the 
parameters  in  0[k].  In  this  paper,  we  will  use  Gaussian  RBFs  to  simplify  calculations: 

ffj(«[*])  =  exP  {-(#]  -  ci)TS-1(g[*]  -  cj)} 

where  the  covariance  matrix  S  =  diag(st ,  s2, . . . ,  sn+m )  and  cj  is  the  jth  mean  or  center.  Finally, 
we  have  the  parameter  vector 

0  —  [Au,A2i,  •  •  • ?  Ani,Ai2,  A22,  •  ■  ■ ,  An 2, . . . ,  Aijv,  A2Ni  •  •  • » AnN)T 
Thus,  0  is  a  vector  in  RnN . 

Implicitly,  we  have  assumed  that  the  state-space  is  really  S  C  Rn  and  the  input  space  is 
C  C  Rm,  where  S  and  C  are  compact  sets  surrounding  the  region  of  interest  in  state  and  input 
space,  respectively.  For  many  applications — e.g.,  motor  control  or  robotics — the  state  space  is 
inherently  compact  (e.g.,  angles,  velocity  saturation,  kinematic  constraints).  In  practice,  one 
would  use  a  safety  net  controller  (usually  based  on  that  portion  of  dynamics  that  is  known  a 
priori)  to  insure  that  the  state  remains  within  S. 

2.2  SLEK  Algorithm 

One  could  now  use  the  EKF  algorithm  to  simultaneuously  approximate  the  state  and  learn 
the  parameters  (here,  weights)  [5].  However,  this  requires  0(n3  +  (nN)3)  operations  per  step 
(assuming  no  cross- correlation  between  the  state  and  the  parameters). 

Instead,  we  wish  to  use  a  localized  version  of  the  EKF  to  update  at  each  step  only  those  pa¬ 
rameters  which  are  significant  to  the  model  at  that  point.  When  the  underlying  basis  functions 
have  compact  support  (or  die  quickly)  as  in  the  case  of  Guassian  RBFs,  this  strategy  can  lead 
to  an  algorithm  which  is  more  computationally  tractable.  The  so-called  Spatially  Localized 
Extended  Kalman  Filter  (SLEK)  algorithm  of  [4]  implements  such  a  strategy. 

To  localize,  we  need  to  create  a  generalized  nearest  neighbor  map: 

Te  :  Rn+m  — >  power  set  of  {1,  2, ...,  nN} 


That  is,  Te(q)  is  a  map  from  a  point  in  the  state-input,  space  to  the  indices  of  those  parameters 
that  are  significant  to  the  approximation  at  that  point.  Next,  we  will  let  </>i0cai(<7)  denote  the 
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vector  (matrix)  obtained  by  ordering  those  elements  of  the  vector  (matrix)  <j>  which  are  “picked 
off”  by  Tt(q).  For  example,  #i0cat(<7)  €  Rp  where  p  is  the  cardinality  of  Tt(q)  and  #iOCai ,j(q)  =  $k 
if  k  is  the  jth  smallest  element  of  Te(q).  Likewise,  the  matrix  0i„cai(9)  would  be  p  x  p  with 
Qij  part  of  0iocai(g)  if  {*’,  j}  C  Te(q)  (with  the  obvious  ordering).  Finally,  in  a  slight  abuse  of 
notation,  we  will  say  0;  £  0iOCal(<?)  whenever  i  £  Te(q).  Specifically,  we  will  say  Aij  £  0iocal(<z) 
whenever  ij  £  Te(q). 

To  ease  exposition  and  the  complexity  analysis,  we  will  consider  the  nearest  neighbor  map 
that  gives  the  two  closest  centers  in  each  dimension.  In  other  words,  if  di  is  the  spacing  of  the 
centers  of  the  RBFs  in  dimension  i  of  the  state-input  space  and  if  cj  is  the  center  vector  of  the 
j t.li  RBF, 

^local(?)  =  {■‘'Ifcj  :  1 1i  ~  cj,i\  ^  ^t} 

where  k  =  1, 2, . . . ,  n,  j  =  1, 2, . . . ,  N,  and  i  =  1,  2, . . . ,  n  +  m.  Note  that  this  can  be  made 
consistent  with  our  general  ( Te(q ))  notation  by  picking 

{n+m 

-  53  (di/s^2 
i= 1 

So  now  the  sum  in  (1)  is  taken  over  j  such  that  Atj  £  0iOcai(<z[&])-  Next,  let’s  construct  an 
augmented  state  vector  with  the  local  parameters  of  0: 

-j-  X j 

0ioCai[fc  + 1  ]{q[k]) 


fnet{x[k],u[k}) 

_L 

m 

0iocai[fc](g[fc]) 

l 

*/local[&] 

=  jF(z[&],  u[fe])  -f  «?[&] 


z[k  -f  1]  — 


where 


E{v[k]vT[n]}  =  0[fc]^fen 
EMkifin}}  =  0 


so  that 


=  Skn 


Q[k]  0 

0  01ocal[&] 


If  we  linearize  the  new  system  about  the  current  conditions  we  see  that 


Sz[k  +  1]  «  ,A[fc]£z[fc]  +  +  tuffc] 


where 


A[k)  = 
B[k]  = 


dF_ 

dz 

d£ 

du 


£[fc|/c],u[fe] 

z[fc|fe],u[fc] 


dfnrt/dx  dfnet/ddlocal 

0  I 


S[fe|fc],u[fc] 


[dfnet/du] 

£[fc|fc],u[fc] 
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where  z[k\k]  is  our  best  current  estimate  of  the  state  provided  by  the  SLEK  algorithm  (see 
below).  With  some  straightforward  calculations,  we  see  that 


&fnet,i 


dqi[k } 


[^net,  (i, «)(#])  -«[*]/net,i(?[fc])] 


where 

hn't,(i,i){q[k])  =  £  AijCjjgjiqik}) 

i:A(,€«local(q[fe]) 


is  a  network  with  the  same  RBFs  but  with  scaled  weights.  We  also  have: 


dfneUi  _  j  0,  9j[k\  $local(<?[&]) 

d9j{k)  q[fc]  “  \  gi{q[k]),  9j[k]  €  0iocai (?[&]),/  =  \j/n] 


The  measurement  equation  is  y[k ]  =  [/,  0]2[fc]  +  u[fc]. 

Now,  we  will  use  the  SLEK  algorithm  to  estimate  the  augmented  state  vector.  The  algorithm 
is  as  follows  (see  [4,  1]): 


z[k\k  —  1]  = 

E[fe|fc-1]  = 

H[k]  = 
= 

E[*|fc]  = 


F{z[k  -  l\k-  l],u[Jfe-  1]) 

A[k  -  l]E[Jb  -  l|Jfe  -  l].lr[fc  -  1]  + 


0 

0  ©local[k] 


E[fc|Jb-  1]CT  [cs[k\k  -  1}CT  +  R[k]\  1 
z[k\k  -  1]  +  H[k]  [y[fc]  -  CF{z[k\k  -  l],u[fe])] 
[I  -  H[k]C]S[k\k  -  1] 


Note  that  the  calculations  above  can  be  performed  much  more  efficiently  than  the  general 
n  +  2n  Kalman  Filter  algorithm  because  of  the  special  form  of  C  and  S[-|-]:  C  —  [Jn,  0n,2»]  and 
£[.|-]  =  blockdiag(Ei[*|-],  E2[’|-]),  where  are  symmetric  and  have  dimensions  n  x  n  and 

2n  X  2n,  respectively. 

The  successive  updates  of  the  SLEK  algorithm  attempt  to  estimate  the  augmented  state 
vector,  part  of  which  are  the  local  parameters  of  the  neural  network.  Thus  the  SLEK  algorithm 
trains  the  neural  network  by  finding  the  weights  necessary  to  express  the  dynamic  nonlinearity 
as  a  local  sum  of  Gaussians  (assuming  such  a  decomposition  is  possible). 


2.3  Near  Optimal  Control  with  an  Approximate  Model 

Now  we  proceed  with  the  control  problem.  At  every  discrete  point  in  time  we  have  an  estimate 
of  where  the  system  will  go  if  we  were  to  apply  a  specific  input.  This  estimate  is  provided  to  us 
by  the  SLEK  algorithm  along  with  a  measure  of  its  uncertainty,  the  covariance  (Si[fe|fc]).  Let 

G  :  Rn+m  ->  R 

be  a  (non- negative)  cost  function.  For  a  discount  factor  a  G  (0, 1)  we  define 

J(*[0])  =  E|53«'G(a:[i],u[i]) 

v  t=0 
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to  be  the  penalty  function  for  the  infinite  horizon  problem.  So  we  want  to 


mm 

u[°],u[l],u[2],... 


with 


Pr  {x[k  +  1]  =  y\q[k]) 


K(n,8)  \^[k\kr^  Tk(y),  ||/net(<?[*];  0[k])  -  t/IU  <  8 


0, 


otherwise 


where  |.A|  is  the  determinant  of  matrix  A  and  K  (n,  8)  is  a  constant  that  normalizes  the  truncated 
Gaussian  pdf  to  integrate  to  unity. 

Tk{y)  -  exp  j-i  [y  -  /net(cjf[fc];^[fc])]:r(S1[fc|fc])-1  [y  -  /net(#];  #[£])]  j 

and  8  comes  from  discretizing  the  state  space.  In  particular,  if  S  C  Rn  was  the  state  space 
before,  we  now  consider  the  ^-lattice  that  spans  S ,  called  S6 .  Thus  we  substitute  S  with  a  grid 
of  “fineness”  8.  From  the  definition  of  the  probability,  we  only  allow  nonzero  probability  of  the 
two  closest  neighbors  in  each  dimension  of  the  state  estimate  (though,  as  with  Tc  before,  more 
general  maps  can  be  formulated).  This  is  a  convention  for  computational  tractability. 

From  now  on,  the  input  space  is  also  assumed  to  be  discrete  (for  practical  considerations). 
In  particular  the  input  space  is  C  C  Rm-  The  discretization,  C6' ,  need  not  have  8'  =  8  because 
it  arises  from  different  considerations. 

We  will  use  dynamic  programming  to  solve  the  above  control  problem.  In  particular,  we 
will  choose: 

u[k  +  1]  =  argmin<  G(a:[A:],  u)  +  a  Y]  Pr(y|a:[fe],  u)  j(y)  1 

““  l  V6S*  J 


where  J(y)  is  the  estimated  cost-to-go.  So  u[k  +  1]  is  the  solution  of 

dPr(y|g[fc]) 


dG_ 

du 


4k+i] 


y€Ss 


du 


u[fe+l] 


J(y) 


(2) 


We  now  digress  to  discuss  how  to  estimate  the  cost-to-go.  Let  B(Ss)  be  the  set  of  bounded 
continuous  functions  on  Ss.  Consider  the  following  operator: 


T  :  B(Ss) 


B(Ss) 


TJ(x )  =  min  ^  G(a:,  u)  +  a  Y]  Pr(j/|g[fc])J(j/) 

”ec  l  »i?< 


J 


This  operator  can  be  proven  to  be  a  contraction  [2].  So,  since  B(Ss)  is  complete,  T  has  a 
unique  fixed  point,  which  is  the  solution  we  seek,  i.e.,  the  real  cost-to-go.  So,  if  we  start  from 
some  approximation  Jq{x),  we  can  iterate: 


=  TJi(x) 
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One  could  even  use  fancier  techniques  to  obtain  tight  bounds  on  the  error  at  every  iteration 
( e.g .,  see  [8]).  We  will  not  concern  ourselves  with  the  specifics  of  this  approximation  in  this 
paper. 

Going  back  to  solving  (2),  we  see  that  it  is  not  a  straightforward  task.  So  we  will  iterate  u 
to  find  u[k  -f  1].  We  will  first  introduce  some  notation: 


Also,  define  the  function 


—  [Cr,n+1?  •  •  •  i  Cr,ti+m] 

d  —  •  .  . ,  C?n+m] 

Br  =  {»6«m|||»-c;||0O<Irf'} 


pr:Rm  -  Bt 

pT(x)  =  arg  min  ||z  —  j/|| 
y(:BT 


Then 


ilP+1)(!C)  =  Pr  I  u{P\x)  +  /3 


dG 


du 


+  o  V 
“ipl(*)  yes* 


t?Pr(t/|a:,  u) 


du 


«lp,(*) 


Jk(y) 


with  /3  <  0,  where 

dPr(y|a:,u) 


du 


4P)(«) 


=  Pr(y|*,4p)(a:))Afe(y) 


&k(y)  =  [y  -  fn't{x,ulp)(xy,0[k])}T(Y,1[k\k])  1 2 * * 


4p)(x) 


Above,  p  is  the  index  of  iteration  and  r  shows  that  this  gradient  search  is  implemented  in 
parallel  Nm  times,  where  N  is  the  number  of  RBFs  per  input  dimension.  In  particular,  for 
each  r  we  will  start  the  corresponding  search  from  c'r  and  limit  it  to  the  ||  •  ||oo  ball  around  it 
of  radius  d' . 


2.4  The  Hierarchical  Algorithm 

We  now  describe  the  overall  algorithm  (see  Figure  1): 

0.  Initialize  the  algorithm  with  k  =  0  and  u[-l]  =  0;  of  each  D.  P.  r  at  c'r\  and  the 

cost-to-go  estimate  at  some  Jq(x),  for  all  x  €  Ss 

1.  Given  y[k  —  1],  u[k  —  1],  run  SLEK  at  the  top  level,  updating  /net,  hnt t,  and  Ei [k  —  l|fc  —  1] 
and  send  them  to  the  lower  level. 

2.  Given  fne t,  hnet,  and  Si [k  —  l|fc  -  1],  run  at  the  D.  P.  command  level  for  each  x  G  <S6 

=  <?(*,Umin(*))  +  a  ^2  Pr(y|®,umin(a:))jfc_2(y) 

yESe 
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3.  Run  (for  a  prespecified  length  of  time  or  number  of  steps,  or  until  convergence)  for  each 
x  £  Ss  each  D.  P.  r  starting  at  u*(a:)  from  the  previous  iteration. 

4.  Send  the  new  ti*(a:)’s  to  the  upper  D.  P.  level  which  recalculates 


=  arg  min 
*;(«) 


<?(*,<$(*))+«  Y,  Pr(j/|MC(*))«4-l(t/) 

yes6 


5.  Set  u[fc]  =  —  1])  and  send  to  the  plant. 

6.  Increase  k  and  Go  back  to  Step  1. 

There  are  some  points  we  wish  to  make  about  this  algorithm.  In  practice,  we  suggest  that 
the  SLEK  algorithm  be  run  first,  with  random  inputs,  for  a  number  of  iterations  in  order  for 
the  network  to  begin  converging.  Then,  the  search  for  umin(a:)  can  be  localized  after  a  limited 
number  of  D.  P.  iterations.  Thus,  the  number  of  required  computations  will  decrease  rapidly. 


3  Computational  Complexity 

We  will  now  discuss  some  computational  complexity  considerations.  Let’s  assume  we  use  N 
Gaussian  RBFs  in  each  dimension  of  the  state-input  space.  Then  each  iteration  of  u£'*(:c)  at 
D.  P.  r  takes  0(n39n)  operations.  At  the  upper  level,  each  iteration  of  the  SLEK  algorithm 
takes  0(n3  -f  8n)  operations.  Also,  we  need  0(Nmn3 3")  operations  to  calculate  wmin(x). 

4  Conclusions 

To  conclude,  we  have  presented  a  hierarchical  algorithm  based  on  training  recurrent  networks 
for  nonlinear  control.  The  Extended  Kalman  Filter  (EKF)  algorithm  and  its  localized  variant 
(SLEK)  play  a  central  role,  both  in  the  training  of  the  neural  network  and  in  the  dynamic 
programming  iterations  to  find  the  near  optimal  input  at  any  time:  The  SLEK  algorithm 
is  used  for  training  a  neural  network  to  approximate  a  nonlinear  feedback  system;  we  use  a 
dynamic  programming-based  method  of  deriving  near  optimal  control  inputs  for  the  real  plant 
based  on  this  approximation  and  a  measure  of  its  error  (covariance).  Finally,  we  combine 
these  methods  in  a  hierarchical  algorithm  for  identification  and  control  of  a  class  of  uncertain, 
unknown  systems.  We  also  provide  computational  complexities  for  the  different  calculations 
required  for  the  algorithm. 
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Figure  1:  Hierarchical  Controller 
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